In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # Visualization Library
import seaborn as sns
In [2]:
df=pd.read_csv("Click.csv")
In [3]:
df.head()
Out[3]:
Daily Time Spent on Site Age Area Income Daily Internet Usage Ad Topic Line City Gender Country Timestamp Clicked on Ad
0 62.26 32.0 69481.85 172.83 Decentralized real-time circuit Lisafort Male Svalbard & Jan Mayen Islands 2016-06-09 21:43:05 0
1 41.73 31.0 61840.26 207.17 Optional full-range projection West Angelabury Male Singapore 2016-01-16 17:56:05 0
2 44.40 30.0 57877.15 172.83 Total 5thgeneration standardization Reyesfurt Female Guadeloupe 2016-06-29 10:50:45 0
3 59.88 28.0 56180.93 207.17 Balanced empowering success New Michael Female Zambia 2016-06-21 14:32:32 0
4 49.21 30.0 54324.73 201.58 Total 5thgeneration standardization West Richard Female Qatar 2016-07-21 10:54:35 1
In [4]:
df.tail()
Out[4]:
Daily Time Spent on Site Age Area Income Daily Internet Usage Ad Topic Line City Gender Country Timestamp Clicked on Ad
9995 41.73 31.0 61840.26 207.17 Profound executive flexibility West Angelabury Male Singapore 2016-01-03 03:22:15 1
9996 41.73 28.0 51501.38 120.49 Managed zero tolerance concept Kennedyfurt Male Luxembourg 2016-05-28 12:20:15 0
9997 55.60 39.0 38067.08 124.44 Intuitive exuding service-desk North Randy Female Egypt 2016-01-05 11:53:17 0
9998 46.61 50.0 43974.49 123.13 Realigned content-based leverage North Samantha Female Malawi 2016-04-04 07:07:46 1
9999 46.61 43.0 60575.99 198.45 Optimized upward-trending productivity Port Jeffrey Male Northern Mariana Islands 2016-04-03 21:13:46 1
In [5]:
df.info
Out[5]:
<bound method DataFrame.info of       Daily Time Spent on Site   Age  Area Income  Daily Internet Usage  \
0                        62.26  32.0     69481.85                172.83   
1                        41.73  31.0     61840.26                207.17   
2                        44.40  30.0     57877.15                172.83   
3                        59.88  28.0     56180.93                207.17   
4                        49.21  30.0     54324.73                201.58   
...                        ...   ...          ...                   ...   
9995                     41.73  31.0     61840.26                207.17   
9996                     41.73  28.0     51501.38                120.49   
9997                     55.60  39.0     38067.08                124.44   
9998                     46.61  50.0     43974.49                123.13   
9999                     46.61  43.0     60575.99                198.45   

                               Ad Topic Line             City  Gender  \
0            Decentralized real-time circuit         Lisafort    Male   
1             Optional full-range projection  West Angelabury    Male   
2        Total 5thgeneration standardization        Reyesfurt  Female   
3                Balanced empowering success      New Michael  Female   
4        Total 5thgeneration standardization     West Richard  Female   
...                                      ...              ...     ...   
9995          Profound executive flexibility  West Angelabury    Male   
9996          Managed zero tolerance concept      Kennedyfurt    Male   
9997          Intuitive exuding service-desk      North Randy  Female   
9998        Realigned content-based leverage   North Samantha  Female   
9999  Optimized upward-trending productivity     Port Jeffrey    Male   

                           Country            Timestamp  Clicked on Ad  
0     Svalbard & Jan Mayen Islands  2016-06-09 21:43:05              0  
1                        Singapore  2016-01-16 17:56:05              0  
2                       Guadeloupe  2016-06-29 10:50:45              0  
3                           Zambia  2016-06-21 14:32:32              0  
4                            Qatar  2016-07-21 10:54:35              1  
...                            ...                  ...            ...  
9995                     Singapore  2016-01-03 03:22:15              1  
9996                    Luxembourg  2016-05-28 12:20:15              0  
9997                         Egypt  2016-01-05 11:53:17              0  
9998                        Malawi  2016-04-04 07:07:46              1  
9999      Northern Mariana Islands  2016-04-03 21:13:46              1  

[10000 rows x 10 columns]>
In [6]:
df.describe
Out[6]:
<bound method NDFrame.describe of       Daily Time Spent on Site   Age  Area Income  Daily Internet Usage  \
0                        62.26  32.0     69481.85                172.83   
1                        41.73  31.0     61840.26                207.17   
2                        44.40  30.0     57877.15                172.83   
3                        59.88  28.0     56180.93                207.17   
4                        49.21  30.0     54324.73                201.58   
...                        ...   ...          ...                   ...   
9995                     41.73  31.0     61840.26                207.17   
9996                     41.73  28.0     51501.38                120.49   
9997                     55.60  39.0     38067.08                124.44   
9998                     46.61  50.0     43974.49                123.13   
9999                     46.61  43.0     60575.99                198.45   

                               Ad Topic Line             City  Gender  \
0            Decentralized real-time circuit         Lisafort    Male   
1             Optional full-range projection  West Angelabury    Male   
2        Total 5thgeneration standardization        Reyesfurt  Female   
3                Balanced empowering success      New Michael  Female   
4        Total 5thgeneration standardization     West Richard  Female   
...                                      ...              ...     ...   
9995          Profound executive flexibility  West Angelabury    Male   
9996          Managed zero tolerance concept      Kennedyfurt    Male   
9997          Intuitive exuding service-desk      North Randy  Female   
9998        Realigned content-based leverage   North Samantha  Female   
9999  Optimized upward-trending productivity     Port Jeffrey    Male   

                           Country            Timestamp  Clicked on Ad  
0     Svalbard & Jan Mayen Islands  2016-06-09 21:43:05              0  
1                        Singapore  2016-01-16 17:56:05              0  
2                       Guadeloupe  2016-06-29 10:50:45              0  
3                           Zambia  2016-06-21 14:32:32              0  
4                            Qatar  2016-07-21 10:54:35              1  
...                            ...                  ...            ...  
9995                     Singapore  2016-01-03 03:22:15              1  
9996                    Luxembourg  2016-05-28 12:20:15              0  
9997                         Egypt  2016-01-05 11:53:17              0  
9998                        Malawi  2016-04-04 07:07:46              1  
9999      Northern Mariana Islands  2016-04-03 21:13:46              1  

[10000 rows x 10 columns]>
In [7]:
df.columns
Out[7]:
Index(['Daily Time Spent on Site', 'Age', 'Area Income',
       'Daily Internet Usage', 'Ad Topic Line', 'City', 'Gender', 'Country',
       'Timestamp', 'Clicked on Ad'],
      dtype='object')
In [8]:
df.shape
Out[8]:
(10000, 10)
In [9]:
df.size
Out[9]:
100000
In [10]:
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default
Out[10]:
'plotly'

The "Clicked on Ad" columnn contains 0 and 1 values, where 0 means not clicked and 1 means clicked. I'll transform these values into "Yes" and "No"¶

In [11]:
df['Clicked on Ad']=df["Clicked on Ad"].map({0:"No",1:"Yes"})

Now let's analyze the click-through rate based on the time spent by the users on the website:¶

In [12]:
fig=px.box(df,
          x="Daily Time Spent on Site",
          color="Clicked on Ad",
          title="Click through rate based time spent on site",
          color_discrete_map={'Yes':'Blue',
                             'No':'Red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

From the above graph, we can see that the users who spend more time on the website cleck more on ads. Now let's analyze the click-through rate on the daily internet usage of the user:¶

In [13]:
fig=px.box(df,
          x="Daily Internet Usage",
          color="Clicked on Ad",
          title="Click through rate based on Daily Internet Usage",
          color_discrete_map={'Yes':'Blue',
                             'No':'Red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

From the above graph,we can see that the users with high internet usage click less on ads compared to the users with low internet usage. Now let's analyze the click-through rate based on the age of the users:¶

In [14]:
fig=px.box(df,
          x="Age",
          color="Clicked on Ad",
          title="Click through rate based on Age",
          color_discrete_map={'Yes':'Blue',
                             'No':'Red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

From the above graph, we can see that users around 40 years click more on ads compared to users around 27-36 years old. Now let's analyze the click-through rate based on the income of the users:¶

In [15]:
fig=px.box(df,
          x="Area Income",
          color="Clicked on Ad",
          title="Click Through Rate based on Income",
          color_discrete_map={'Yes':'Blue',
                             'No':'Red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

Calculating Click Through Rate of Ads¶

In [16]:
df['Clicked on Ad'].value_counts()
Out[16]:
Clicked on Ad
No     5083
Yes    4917
Name: count, dtype: int64

So 4917 out of 10000 users clicked on the ads. Let's calculate the Click Through Rate:¶

In [17]:
click_through_rate=4917/10000*100
print(click_through_rate)
49.17
In [ ]:
 
In [ ]:
 
In [ ]: